Empirical Studies of Code Clone Genealogies
نویسندگان
چکیده
Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs so as to properly propagate any changes between clones. A clone pair experiences many changes during the creation and maintenance of software systems. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. However, if a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this thesis, we provide a formal definition of clone genealogies, and perform two case studies to examine clone genealogies. In the first study, we examine clone genealogies to identify fault-prone " patterns " of states and changes. We also build prediction models using clone metrics from one snapshot and compare them to models that include historical evolutionary information about code clones. We examine three long-lived software systems and identify clones using Simian and CCFinder clone detection tools. The results show that there is a relationship between the size of the clone and the time interval between changes and fault-proneness of a clone pair. Additionally, we show that adding evolutionary information increases the precision, recall, and F-Measure of fault prediction models by up to 26%. In our second study, we define 8 types of late propagation and compare them to other forms of clone evolution. Our results not only verify that late propagation is more harmful to software systems, but also establish that some specific cases of late propagations are more harmful than others. Specifically, two cases are most risky: (1) when a clone experiences inconsistent changes and then a re-synchronizing change without any modification to the other clone in a clone pair; and (2) when two clones undergo an inconsistent modification followed by a re-synchronizing change that modifies both the clones in a clone pair. iii Co-Authorship The case study in Chapter 5 of this thesis is based on a published paper [1] co-authored with my supervisor, Dr. Ying Zou and Dr. Foutse Khomh, a post-doctoral fellow in our lab. The content of Chapter 3, which describes our approach for both case studies, is …
منابع مشابه
Empirical Studies of Clone Mutation and Clone Migration in Clone Genealogies
Duplications and changes made on code segments by developers form code clones. Cloned code segments are exactly the same or have a particular similarity. A set of cloned code segments that have the same similarity with each other become a clone group. A clone genealogy contains several clone groups in different revisions and time periods. Based on different textual similarities, there are three...
متن کاملDetection and Analysis of Near-Miss Clone Genealogies
It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the e...
متن کاملAn empirical study of faults in late propagation clone genealogies
Two similar code segments, or clones, form a clone pair within a software system. The changes to the clones over time create a clone evolution history. In this work, we study late propagation, a specific pattern of clone evolution. In late propagation, one clone in a clone pair is modified, causing the clone pair to diverge. The code segments are then reconciled in a later commit. Existing work...
متن کاملAn Empirical Study of Long-Lived Code Clones
Previous research has shown that refactoring code clones as soon as they are formed or discovered is not always feasible or worthwhile to perform, since some clones never change during evolution and some disappear in a short amount of time, while some undergo repetitive similar edits over their long lifetime. Toward a long-term goal of developing a recommendation system that selectively identif...
متن کاملAn empirical study on inconsistent changes to code clones at the release level
To study the impact of code clones on software quality, researchers typically carry out their studies based on fine-grained analysis of inconsistent changes at the revision level. As a result, they capture much of the chaotic and experimental nature inherent in any ongoing software development process. Analyzing highly fluctuating and short-lived clones is likely to exaggerate the ill effects o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012